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ABSTRACT. Diaconis and Sturmfels introduced an influential method to construct Markov 
chains using commutative algebra. One major point of their method is that infinite families 
of graphs are simultaneously proved to be connected by a single algebraic calculation. For 
large state spaces in the infinite families these Markov chains are not rapidly mixing and 
only ad hoc methods have been available to improve their mixing times. We provide a 
method to get rapid mixing by constructing expanders for the Diaconis-Sturmfels type 
Markov chains. 


1. The model problem 

In this text we discuss how to build expanders for a certain class of model problems. 
These problems were originally solved using the Markov bases designed by Diaconis 
and Sturmfels HDSL and they were not rapidly mixing in this setting. The book by Drton, 
Sturmfels and Sullivant IlDSSll is an excellent introduction to algebraic statistics. 

For basic facts about expanders and mixing time, we refer to the survey by Hoory, Linial 
and Widgerson ||FILWfl . The expanders in this paper are built using the zig-zag product 
introduced by Reingold, Vadhan and Widgerson ||RVW)1 . Mixing time and expansion are 
properties of sequences of larger and larger graphs, not of single graphs. We are interested 
in graphs from Markov bases and they come naturally in sequences for fixed design 
matrices. In this paper we don't study sequences of graphs from different design matrices; 
allowing for any design matrices essentially doesn't restrict the sequence of graphs and 
it's hard to make useful statements in that general setting. 

1.1. The model problem. We are provided the following data: 

(i) A finite set S of rational numbers in [0,1 ) d . 

(ii) A full dimensional poly tope P in R d . 

For every positive integer m we want to construct a connected graph G m whose vertex 
set is 


(S + Z d )nmP 
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and it should be as fast as possible to random walk on G m . The main result of this paper is 
an explicit construction of graphs G m that are expanders by a straight-forward application 
of the zig-zag product. 

1.2. Markov bases are not rapidly mixing for the model problem. The vertices of G m 
in our model problem have coordinates and we can assign a length to each edge of G m 
from the Eulidean metric. We consider a wider context than of Markov bases: Assume 
that the edge lengths are bounded from above by some i for all G m , even as m —> oo. 
Fix an hyperplane H that divides P into two equal volumes, such that the (codimension 
one) volume of P fl H is minimal among those hyperplanes. Removing all vertices of 
(S + Z d ) fl mP that are within distance i from mH will cut the graph G m into two disjoint 
subgraphs of similar orders. As m —> oo the proportion of vertices removed from G m to 
cut it into two pieces tends to 0, and this proves via the Cheeger inequality that we don't 
have expansion. In |2.7| we make some remarks on this in practice. 

1.3. The model problem in algebraic statistics. A typical problem in algebraic statistics is 
to random walk on all k x k contingency tables with fixed row and columns sums m. This 
is encoded as Ax = mb where A is the design matrix. Both it and b are kept fixed while m 
is allowed to grow. The non-negative integer vectors x encode the contingency tables. The 
non-negativity condition provides linear inequalities for a usually not full-dimensional 
polytope, but after restricting to the correct ambient space we have an instance of our 
model problem. For different m we might get different polytopes P, but one can deduce 
that they fall into a finite number of versions of our model problem. 

1.4. Relation to other work. Windisch have generalised and provided detailed analysis 
of several aspects of our results, and provided an independent construction of expanders 
for contingency tables in his preprint KWll . 

2. The main construction 

We want to construct expanders for our model problem. In the construction we will 
refer to the example illustrated in Figures [l] and [2] 

2.1. Input data. We are provided the following data: 

(i) A non-empty finite set S = {s 0 , Si ,.. . , s ns _i) of rational numbers in [0,1 ) d . 

(ii) A full dimensional polytope P in R d defined by rational linear inequalities. This is 
the pale red filled triangle in Figure [T] 

For every positive integer m we construct a graph G m . The first part of the construction 
is independent of m and only calculated once - this is similar to the spirit of a Markov 
basis. 
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2.2. The part independent of m. 

(in) Push each of the hyperplanes defining P outwards by at least y/ d/2 to get a polytope 
Q defined by rational linear inequalities. This is the blue triangle in Figure [I] 

(iv) Define a connected d H -regular graph H whose n H vertices are in bijections to the set 
{z G Z d | z + (1/2,..., 1 /2) G Q}. There are several properties of H that would be beneficial 
but aren't necessary: Preferably the neighbours of a vertex should be possible to calculate 
fast at any vertex, instead of keeping H in memory. It is also preferable if the vertex degree 
is small. The brute force way to construct H is as a complete graph. The recommended 
and more gentle way is to use Markov bases, and if necessary add multiple self-loops at 
vertices to make it regular. The green big dots in Figure [l] are the vertices of H. 

(v) We denote the absolute value of the second largest eigenvalue of H by A H . It is 

_i /2 

advisable to have a reasonable upper bound for A H at this point: A h + 2n H should be 
smaller than one for this construction to achieve expansion as m —> oo. In practice A H is 

usually fine when H is small and derived from a Markov basis. One can always make H 

_1 /2 

denser and push Ah towards 2n H 

2.3. The part dependent of m. 

(vi) Take a regular graph (expander) E with minimal (absolute) second eigenvalue A E 
provided that it should have n E = m d vertices and vertex degree uh. For large n E we have 
that A e ~ 2n H 1/2 , see BHLWl . 
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FIGURE 2. Illustration of changing from m = 2 to m = 6 in the main construction. 


(vii) Let G m be the zig-zag product between E and H, see [ RVW ] for the definition 
and basic properties of the zig-zag product. The vertex set of G m is V(E) x V(H), it is a 
d;-regular graph, and the absolute value of its second eigenvalue satisfy AG m < A e + A h . 

(iix) By construction G m has n s • m d vertices and there is a bijection chi : V(E) —> 
{0,1,... ,n s —l}x{0,1,..., m— l} d . We can assume that V(E) is represented by {0,1,... ,n E — 
1} and do this fast by modulo arithmetic. Compose <J)i with 4) 2 : (h, x -\,..., x d ) i—> ms h + 
(xi,..., x d ). This gives a bijection to the translated points of S situated inside big cubes 
whose middle points are inside Q. These points are the small red points inside the jagged 
green triangle in Figure [I] In Figure [I] we had m = 2. The local transition to from m = 2 to 
m = 6 is drawn in Figure |2j 

(ix) Each of the points of (S + Z d ) fi mP are given by a vertex of G m , and we refer to them 
as the relevant vertices. The other vertices are irrelevant vertices. Note that as m —> oo the 
proportion of irrelevant vertices tends to a constant £ > 0. 

2.4. Running the MCMC. 

(x) We random walk on G m and for each vertex we test if it's relevant or not by evaluating 
linear inequalities of mP. The irrelevant ones are discarded, and for large m this is a 
constant proportion. We may choose IT to achieve that Ah + 2n H < 0.99 for large enough 
m. This shows that the second eigenvalue of G m is bounded away from 1 by a constant as 
m —> oo, or equivalently: 

2.5. Theorem. — The constructed Markov chain is rapidly mixing. 

2.6. General remarks. If the proportion of discarded vertices is too high, it can be 
pushed arbitrarily close to zero by replacing P by m/P for some big positive integer 
m/. In practical examples this is a bad idea; it is surprisingly efficient in examples from 
algebraic statistics to discard a huge proportion, like 99.99%, because the Markov basis 
inside the main construction might be much easier to calculate then. If we are prepare to 
discard even more, the blue triangle in Figure [l] might be replaced by a box or some other 
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polytope containing P that is easy to understand - compare to the extension methodology 
in optimisation theory. 

An obvious extension is to not scale P by m in all axes but to have different scalings 
depending on the direction. It works fine to modify the main construction to achieve this, 
but we didn't spell it out since it doesn't add anything conceptually. 


It is tempting to ask if the construction could be recursed, with H coming from it, 
and repeat several times. This is possible, but not necessary. The best constructions of 
expanders already contains a repeated use of the zig-zag product to build them from 
simple starting graphs. Also referring to the theory of expanders, see |HLWf and |[RVW11 . 
one shouldn't think of building them in memory but rather as building an algorithm that 
is used in each instance of the random walk taking a step. Some of these algoritms are very 
explicit and we believe that for particular classes of problems from algebraic statistics it 
should be possible to write down the explicit zig-zag products with the explicit H graphs 
from Markov basis, and put this into industry grade software as R or Matlab. 


2.7. Remarks related to algebraic statistics. The following interpretation is conceptually 
correct, but not completely correct. Consider the problem of 4 x 4 contingency tables 
with row and column sums m Let the floor of a table be that we replace each entry or r by 
[3r/mJ. The entries of the floor of a table will be 0,1,2 or 3. Note that we will never have 
2,2,2,2 in a row or column of the floor of a table. The graph H in the construction should 
be thought of as a Markov chain on the floors of the contingency tables. The expander E 
is on m 3 ~ vertices because there are 3 2 degrees of freedom. To get a table back from the 
floor and the expander, one would multiply the floor by m, add the expander values on 
the top 3x3 square, and then equate to get the correct row and column sums. Sometimes 
numbers would become negative when equating, and this corresponds to falling outside 
the polytope, that is, being an irrelevant vertex. 


To be completely clear, we should also clarify that the theory of expansion focus on 
when the asymptotic distribution is uniform. Sometimes in algebraic statistics that is 
not the original setting, and it's not clear how to transfer results to that setup. For the 
hypergeometric setting there is a natural way to extend the weights to the irrelevant 
vertices, and in (small) computational experiments by the author it seems like that it's 
mixing better than in the uniform case. On occasions the author have also conjectured that 
to hold in general. 


In |1.2| we explained why the original Markov bases setup doesn't provide rapid mixing. 
In practice it can anyways be interesting to understand the mixing time, in particular in the 
hypergeometric setting. For a transition matrix T with stationary distibution v numerical 
evidence of slow mixing is provided by a vector w that is far from v while Tw is close 
to w. In this remark we explain an heuristic how to find such w. The Markov bases are 
frequently derived from Grobner bases, see BDSSl . In a Grobner basis there is an order 
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on the variables, and that order can be specified as an weight order on the monomials. 
The vertices of our graphs have coordinates, and the coordinates are exponents of the 
monomials. Thus, there is an induced weight cu on the coordinates, and it is actually a 
linear form taking rational values. For some r about half of the vertices gets a weight w 
larger than r. In many problems of algebraic statistics there are symmetries that makes the 
choice of r easy. Define w to have value 1 for vertices with weight to > r and value —1 for 
vertices with weight to < r. Rescale w if necessary. In |1.2| we cut a polytope containing the 
graph by an codimension one hyperplane — this is an explicit construction of that where 
to = 0 defines the hyperplane. 
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